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Abstract — For the retrieval dynamics of sparsely coded attrac- 
tor associative memory models with synaptic noise the inclusion 
of a macroscopic time-dependent threshold is studied. It is shown 
that if the threshold is chosen appropriately as a function of the 
cross-talk noise and of the activity of the memorized patterns, 
adapting itself automatically in the course of the time evolution, 
an autonomous functioning of the model is guaranteed. This self- 
control mechanism considerably improves the quality of the fixed- 
point retrieval dynamics, in particular the storage capacity, the 
basins of attraction and the mutual information content. 



I. Introduction 

Efficient neural network modelling requires an autonomous 
functioning independent from external constraints or control 
mechanisms. For fixed-point retrieval by an attractor asso- 
ciative memory model this requirement is mainly expressed 
by the robustness of its learning and retrieval capabilities 
against external noise, against malfunctioning of some of the 
connections and so on. Indeed, a model which embodies this 
robustness is able to perform as a content-adressable memory 
having large basins of attraction for the memorized patterns. 
Intuitively, one can imagine that these basins of attraction 
become smaller when the storage capacity gets larger This 
might occur, e.g., in sparsely coded models (Okada, 1996 and 
references cited therein). Therefore, the necessity of a control 
of the activity of the neurons has been emphasized such that 
the latter stays the same as the activity of the memorized 
patterns during the recall process. This has led to several 
discussions imposing external constraints on the dynamics. 
However, the enforcement of such a constraint at every time 
step destroys part of the autonomous functioning of the 
network. To solve this problem, quite recently, a self-control 
mechanism has been introduced in the dynamics through the 
introduction of a time-dependent threshold in the transfer 
function (Dominguez & Bolle, 1998; Bolle, Dominguez & 
Amari 2000). This threshold is determined as a function of 
both the cross-talk noise and the activity of the memorized 
patterns in the network and adapts itself in the course of the 
time evolution. 

Up to now only neural network models without synaptic 
noise have been considered in this context. The purpose of 
the present work is precisely to generalise this self-control 
mechanism when synaptic noise is allowed. 



II. The model 
Let us consider a network of N binary neurons. At a discrete 
time step t the neurons ai^t S {0,1}, i = l,...,iV are 
updated synchronously according to the rule 

N 

<^i,t+i = Fg^^i3{hi^t), hi^t = ^ Jvi<^3,t - a) , (1) 

where are the synaptic couplings, a is the activity of the 
memorized patterns and hi,t is usually called the "local field" 
of neuron i at time t. In general, the transfer function Fg^ can 
be a monotonic function with Ot a time-dependent threshold. 
Later on it will be chosen as 

^e„/3(a;) = i[l + tanh(/3(x-0t))]- (2) 

The "temperature" (3 — 1/T controls the thermal fluctuations, 
which are a measure for the synaptic noise (Hertz et al., 1991). 
In the sequel, for theoretical simplicity in the methods used, 
the number of neurons N will be taken to be sufficiently large. 

The synaptic couplings themselves are determined by 
the covariance rule 



P 



a). 



a(l-a). (3) 



The memorized patterns e {0,1}, fJ- = 1,...,P are 
independent identically distributed random variables (iidrv) 
with respect to i and n chosen according to the probabihty 
distribution 

P{0^aS{^':-l) + {l-a)SiO- (4) 
The coefficients Cij G {0, 1} are iidrv with probability 

Pr{a, = d} = [1 - {C/N)]Sd,o + {C/N)5d,i 
Pr{C,j=Cj,} = {ClNf, (C/7V)«l, OQ. (5) 

This introduces the so-called extremely diluted asymmetric 
architecture with C measuring the average connectivity of the 
network (Derrida et al., 1987). 

At this point we remark that the couplings Q are of infinite 
range (each neuron interacts with infinitely many others) 
such that our model allows a so-called mean-field theory 
approximation. This essentially means that we focus on the 
dynamics of a single neuron while replacing all the other 



neurons by an average background local field. In other words, 
no fluctuations of the other neurons are taken into account, not 
even in response to changing the state of the chosen neuron. In 
our case this approximation becomes exact because, crudely 
speaking, hi^t is the sum of very many terms and a central 
Umit theorem can be applied (Hertz et al., 1991). 

It is standard knowledge by now that synchronous mean- 
field theory dynamics can be solved exactly for these diluted 
architectures (e.g., BoUe, 2004). Hence, the big advantage 
is that this will allow us to determine the precise effects 
from self-control in an exact way. We recall that the relevant 
parameters describing the solution of this dynamics are the 
retrieval overlap, m^, between the memorized pattern, and 
the microscopic network state, ct^ t, and the neural activity, qt, 
given by, respectively 



We remark that the are normalized parameters within the 
interval [0,1] which attain the maximal value 1 whenever the 
model succeeds in a perfect recall, i.e., (Ti t = ?f for all i. 

In order to measure the retrieval quality of the recall process, 
we use the mutual information function (BoUe, Dominguez 
& Amari, 2000; Nadal, Brunei & Parga, 1998; Schultz & 
Treves, 1998 and references therein). In general, it measures 
the average amount of information that can be received by the 
user by observing the signal at the output of a channel (Blahut, 
1990; Shannon, 1948). For the recall process of memorized 
patterns that we are discussing here, at each time step the 
process can be regarded as a channel with input and output 
(Ti.f such that this mutual information function can be defined 
as (forgetting about the pattern index fi and the time index t) 

I{cT^■,C^)^Sia,)-{S{a,\^,))^r, (7) 
5(a,) = -^p(a,)ln[p(a,)], (8) 

= - Y,Pi^^\^^)HPi<y^m■ (9) 

Here S{ai) and S{ai\(,i) are the entropy and the condi- 
tional entropy of the output, respectively. These information 
entropies are peculiar to the probability distributions of the 
output. The term {S{ai\Ci))^i is also called the equivocation 
term in the recall process. The quantity p{ai) denotes the 
probability distribution for the neurons at time t, while p{ai\^i) 
indicates the conditional probability that the i — th neuron is in 
a state ai at time t, given that the i — th pixel of the memorized 
pattern that is being retrieved is f^. Hereby, we have assumed 
that the conditional probability of all the neurons factorizes, 
i.e., pCjcilK^i}) — rii-Pl'^'jlfi)' which is a consequence of 
the mean-field theory character of our model explained above. 
We remark that a similar factorization has also been used in 
Schwenker et al. (1996). 

The calculation of the different terms in the expression Q 
proceeds as follows. Formally writing (O) = {{0)a-\f)^ = 
J2(PiOJ2aP('^\0^ arbitrary quantity O the condi- 

tional probability can be obtained in a rather straightforward 



way by using the complete knowledge about the system: 
(^) = a, {a) = q, {a^) — am, (1) = 1. The result reads 
(we forget about the index i) 



Pif^lO = [70 + (™ - 7o)e] '5(ct - 1) 
+ [1 - 70 - ("i - 70)^] (5(cr), 
q — am 



70 



1 - a 



One can simply verify that this satisfies the averages 



(10) 



(11) 



and those are precisely equal, for large N, to the parameters 
TO and q mentioned above (Eq. (|6j). Using the probability 
distribution of the patterns (Eq.©), we furthermore obtain 

pia) = Y,PiOpi^\0 - qS{a - 1) + (1 - q)S{a). (12) 

Hence the expressions for the entropies defined above become 

S{a) = -qlliq-il-q)lli{l-q), (13) 
(<5'(a|^))^ — — a[m In(TO) + (1 — m) ln(l — to)] 

-(1 - a) [70 In 70 + (1 - 70) ln(l - 70)]. (14) 

Recalling eq. (0 this completes the calculation of the mutual 
information content of the present model. 

III. Self-control dynamics 

It is standard knowledge (e.g., Derrida et al., 1987; Bolle, 
2004) that the synchronous dynamics for diluted architectures 
can be solved exactly following the method based upon a 
signal-to-noise analysis of the local field ([0 (e.g., Amari, 
1977; Amari & Maginu, 1988; Okada, 1996; Bolle, 2004 and 
references therein). Without loss of generality we focus on 
the recall of one pattern, say fi — 1, meaning that only TOj 
is macroscopic, i.e., of order 1 and the rest of the patterns 
causes a cross-talk noise at each time step of the dynamics. 
Supposing that the initial state of the network model, {ct^.o}, 
is a collection of iidrv with mean zero and neural activity go 
and correlated only with memorized pattern 1 with an overlap 
TO J, then the full time evolution can be shown to be given by 



TOj+l 



{Fe,^p[il-a)M^ +L,t])u. (15) 
qt+i = am^+i + (1 - a){Fe,,p{-aMl + ujt))^ , (16) 



with 



Mi 



I _mt-qt 



\-a ' 



(17) 



where we have averaged over the first pattern and where 
the angular brackets indicate that we still have to average over 
the residual (cross-talk) noise loi which can be written as 



uJt = [aQtY'^N{Q,l), Qt^{l-2a)qt + ^ 



(18) 



with A/'(0, 1) a Gaussian random variable with mean zero and 
variance unity and the (finite) loading defined by p = aC. 



Recalling the specific form of the transfer function (|2} we 
expUcitly have 



a=0.01=q„ 



, 2y/2TTaQt 



[1 + tanh[/3(-aMt - 0t + : 



(19) 



and an analogous expression for {Fg^ — a)Ml + u}t])uj- 

Of course, it is known that the quality of the recall process 
is influenced by the cross-talk noise at each time step of 
the dynamics. A novel idea is then to let the network itself 
autonomously counter this cross-talk noise at each time step 
by introducing an adaptive, hence time-dependent, threshold. 
This has been studied for neural network models at zero 
temperature, i.e., without synaptic noise where Fe^^f3=oo{x) = 
Q{x — 6t). For sparsely coded models, meaning that the pattern 
activity a is very small and tends to zero for N large, it has 
been found (Dominguez & Bolle, 1998; Bolle, Dominguez & 
Amari, 2000) that 



t(a) = c{a)yJaQt, c{a) = v^-21n(a) 



(20) 



makes the second term on the r.h.s of Eq.(ll6> asymptotically 
vanish faster than a such that q ^ a. 

It turns out that the inclusion of this self-control threshold 
considerably improves the quality of the fixed-point retrieval 
dynamics, in particular the storage capacity, the basins of 
attraction and the information content. As an example we 
present in Fig. 1 the basin of attraction for the whole retrieval 
phase R for the self-control model with 9sc given by Eq. ^ 
and initial value go = 0.01 = a, compared with a model where 
the threshold Oopt is selected for every loading a by hand in 
an optimal way meaning that the information content i — al 
is maximized. The latter is non-trivial because it is even rather 
difficult, especially in the limit of sparse coding, to choose a 
threshold interval by hand such that i is non-zero. The basin 
of attraction is clearly enlarged with this self-control threshold 
choice and even near the border of critical storage the results 
are still improved. For more details we refer to Dominguez & 
Bofle (1998) and Bolle, Dominguez & Amari (2000). A similar 
threshold also works for sparsely coded sequential patterns 
(Kitani & Aoyagi, 1998) and even for non-sparse architectures 
as well (Bolle & Dominguez Carreta, 2000). 

It is then worthwhile to examine whether such a self-control 
threshold can be found for networks with synaptic noise. No 
systematic study has been done in this case. The specific 
problem to be posed in analogy with the zero-temperature 
case is the following one. Can one determine a form for 
the threshold 9t in Eq. (I19> such that the integral vanishes 
asymptotically faster than a? 

In contrast with the zero-temperature case, where due to the 
simple form of the transfer function, this threshold could be 
determined analytically (recall Eq. i2Q\ . a detailed study of the 
asymptotics of the integral in Eq. ( I19> gives no satisfactory 
analytic solution. Therefore, we have designed a systematic 
numerical procedure through the following steps: 

• Choose a small value for the activity a'. 




Fig. 1 

The basin of attraction as a function of a for a = 0.01 and 

INITIAL qo = a FOR THE SELF-CONTROL MODEL (FULL LINE) AND THE 
OPTIMAL THRESHOLD MODEL (DASHED LINE) AT ZERO TEMPERATURE. 



« Determine through numerical integration the threshold 9' 
such that 



e{x ~9)<a' for 9 > 9' (21) 



for different values of the variance cr^ = ctQt- 
Determine, as a function of the temperature T — 1//3, 
le for 9'rp su 

dx e-y^i"^ 



the value for 6*^ such that 



[1 + tanh[/3(a; - 



for 9>9' 



< a' 



(22) 



The second step leads, as expected, precisely to a threshold 
having the zero-temperature form Eq. j20l l. The third step 
determining the temperature dependent part 9'j, leads to the 
results shown in Fig. 2. Intuitively it is seen that 9'rp behaves 




Fig. 2 

The TEMPERATURE DEPENDENT PART OF THE THRESHOLD 8'j, AS A 
FUNCTION OF T FOR SEVERAL VALUES OF a' 



quadratically. Indeed, making a polynomial fit of these results 
we find that the linear term is negligable and that the quadratic 
term is of the form 6'rp — -~^\n{a')T^. Furthermore, the 
dependence of the coefficient of this quadratic term on the 
variance is very weak. Hence, we propose the following self- 
control threshold 

9tia, T) = V-21n(a)aQt - i \ii{a)T^ . (23) 

Together with Eas. (ll5> - (ll6t this relation describes the self- 
control dynamics of the network model with synaptic noise. 
This dynamical threshold is again a macroscopic parameter, 
thus no average must be taken over the microscopic random 
variables at each time step t. 

At this point we want to make two remarks. First, for a 
binary layered network (Bolle & Massolo, 2000) the inclusion 
of a threshold of the form MQ\ . although not designed for 
non-zero temperatures, is shown to still improve the retrieval 
quality for low pattern activities and low temperatures, in 
comparison with an optimal threshold model analogous to 
the one mentioned above. Secondly, in a recent study of an 
extremely diluted three-state neural network (Dominguez et 
al., 2002) based on information theoretic and mean-field theory 
arguments, a self-control threshold with a linear temperature 
correction term with coefficient 1 has been mentioned without 
any further details. In that specific model this self-control 
threshold is shown to improve the retrieval quality for low 
temperatures but it is not specified how much of the improve- 
ment is really due to the linear correction itself. 
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Fig. 3 

The basin of attraction as a function of a for a = 0.01 and 

SEVERAL VALUES OF THE TEMPERATURE WITH (FULL LINES) AND 
WITHOUT (DASHED LINES) THE TEMPERATURE CORRECTION 0'j, IN THE 
THRESHOLD. 

We have solved this self-control dynamics, Eqs.(ll5>-(I16> 
and Eq. i23i . for our model with synaptic noise, in the limit 
of sparse coding, numerically. In particular, we have studied in 
detail the influence of the temperature dependent part of the 
threshold. Of course, we are only interested in the retrieval 
solutions with M > and carrying a non-zero information /. 



We remark that all numerical calculations presented here are 
done for an appropriate number of time steps (at least of 
the order of a few hundred) in order to assure that a stable 
equilibrium point is reached. 
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Fig. 4 

The EVOLUTION OF THE OVERLAP mt FOR SEVERAL INITIAL VALUES mo 
WITH a = 0.01, T = 0.2 AND a = 1.5 WITHOUT (LEFT) AND WITH 
(RIGHT) THE TEMPERATURE CORRECTION 6'j, IN THE THRESHOLD. 

The important features of the solution are illustrated in 
Figs. 3-5. In Fig. 3 we show the basin of attraction for the 
whole retrieval phase for the model with the temperature-zero 
threshold (I20> (dashed curves) compared to the model with the 
temperature dependent threshold (I23> (full curves) (compare 
also Fig. 1). We see that there is no clear improvement for 
low temperatures but there is a substantial one for higher 
temperatures. Even near the border of critical storage the 
results are still improved such that also the storage capacity 
itself is larger 

This is further illustrated in Fig. 4 where we compare 
the time evolution of the retrieval overlap mt starting from 
several initial values, mo, for the model with (right figure) 
and without (left figure) the quadratic temperature correction 
in the threshold. Here this temperature correction is absolutely 
crucial to force some of the overlap trajectories to go to 
the retrieval attractor m w 1. It really makes the difference 
between retrieval and non-retrieval in the model. At this point 
we remark that the influence of a linear temperature correction 
term has been examined also here but no real improvement has 
been found of the results for the temperature-zero threshold. 

In Fig. 5 we plot the information content i as a function 
of the temperature for the self-control dynamics with the 
threshold ( I23> (full curves), respectively i2Q\ (dashed curves). 
We see that, especially for small loading a a substantial 
improvement of the information content is obtained. 

IV. Conclusions 

In this work we have generalized complete self-control in 
the dynamics of sparsely coded associative memory networks 
to models with synaptic noise. We have proposed an analytic 
form for the relevant macroscopic threshold consisting out of 




Fig. 5 

The information content i as a function of T for several 

VALUES OF THE LOADING a AND a = 0.001 WITH (FULL LINES) AND 
WITHOUT (DASHED LINES) THE TEMPERATURE CORRECTION O'j, IN THE 
THRESHOLD. 



the known form for temperature zero plus a quadratic temper- 
ature correction term dependent on the pattern activity. The 
consequences of this self-control mechanism on the quaUty of 
the recall process by the network have been studied. 

We find that the basins of attraction of the retrieval solutions 
as well as the storage capacity are enlarged and that the 
mutual information content is maximized. This confirms the 
considerable improvement of the quality of recall by self- 
control, also for network models with synaptic noise. 

This allows us to conjecture that this idea of self-control, 
allowing the network to function autonomously, might be 
relevant for other architectures in the presence of synaptic 
noise, and for dynamical systems in general, when trying to 
improve the basins of attraction and convergence times. 
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